• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Á¤º¸°úÇÐȸ³í¹®Áö (Journal of KIISE)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) ¸ðµ¨ Àü¹®È­¸¦ À§ÇÑ Á¶°ÇºÎ Áö½Ä Áõ·ù ±â¹ý
¿µ¹®Á¦¸ñ(English Title) Conditional Knowledge Distillation for Model Specialization
ÀúÀÚ(Author) ±èÇÐºó   ÃÖµ¿¿Ï   Hakbin Kim   Dong-Wan Choi  
¿ø¹®¼ö·Ïó(Citation) VOL 48 NO. 04 PP. 0369 ~ 0376 (2021. 04)
Çѱ۳»¿ë
(Korean Abstract)
ÃÖ±Ù Áö½Ä Áõ·ù±â¹ÝÀÇ ½Å°æ¸Á ¾ÐÃà ±â¹ý¿¡ °üÇÑ ¿¬±¸°¡ È°¹ßÈ÷ ÁøÇàµÇ°í ÀÖ´Ù. ÇÏÁö¸¸, »ç¿ëÀÚ °¡ ±³»ç¸ðµ¨ÀÇ Àüü Ŭ·¡½º Áß ÀϺθ¸À» ºÐ·ùÇϱ⠿øÇÏ´Â °æ¿ì ±âÁ¸ÀÇ Áö½Ä Áõ·ù±â¹ýÀº ºÒÇÊ¿äÇÑ Á¤º¸±îÁö Àü´ÞÇÏ°Ô µÇ¾î ºñÈ¿À²¼ºÀÌ ¹ß»ýÇÑ´Ù. ¶ÇÇÑ, ±âÁ¸ÀÇ Áö½Ä Áõ·ù±â¹ýÀº ±³»ç¸ðµ¨ÀÇ ÇнÀ¿¡ »ç¿ëµÈ µ¥ÀÌÅ͸¦ ÇÊ¿ä·Î ÇÏÁö¸¸, °³ÀÎ Á¤º¸ ¹®Á¦ µîÀ¸·Î Å« Á¦¾àÀÌ µÉ ¼ö ÀÖ´Ù. ÀÌ¿¡ º» ³í¹®¿¡¼­´Â ±³»ç¸ðµ¨ÀÇ Àüü Ŭ·¡½º Áß Æ¯Á¤ Ŭ·¡½ºµéÀÇ ºÐ·ù¸¸À» À§ÇÑ Àü¹®È­µÈ Çлý¸ðµ¨À» ÇнÀÇÏ´Â Á¶°ÇºÎ Áö½Ä Áõ·ù±â¹ý°ú µ¥ÀÌÅÍ°¡ ¾ø´Â »óȲÀ¸·Î È®ÀåµÈ Á¶°ÇºÎ Áö½Ä Áõ·ù±â¹ýÀ» ÇÔ²² Á¦¾ÈÇÑ´Ù. ¾Æ¿ï·¯ »ç¿ëÀÚ°¡ ¼Ò·®ÀÇ µ¥ÀÌÅ͸¸ ¼öÁýÇÑ °æ¿ì, À§ÀÇ µÎ Áõ·ù±â¹ýÀÌ °áÇÕµÈ ¹æ½Äµµ ÇÔ²² Á¦¾ÈÇÑ´Ù. Á¦¾ÈÇÏ´Â ±â¹ýÀ» ÅëÇØ ÇнÀµÈ Àü¹®È­µÈ Çлý¸ðµ¨Àº ±âÁ¸ÀÇ Áö½Ä Áõ·ù¸¦ ÅëÇØ ÇнÀµÈ Çлý¸ðµ¨º¸´Ù ³ôÀº Á¤È®µµ¸¦ ´Þ¼ºÇÏ¿´À¸¸ç µ¥ÀÌÅÍ°¡ ¾ø´Â »óȲ¿¡¼­µµ µ¥ÀÌÅ͸¦ »ç¿ëÇÑ Áö½Ä Áõ·ù±â¹ý¿¡ ºñÇØ ´ëºÎºÐÀÇ ½ÇÇè¿¡¼­ ³ôÀº Á¤È®µµ¸¦ ´Þ¼ºÇÏ¿´´Ù.
¿µ¹®³»¿ë
(English Abstract)
Many recent works on model compression in neural networks are based on knowledge distillation (KD). However, since the basic goal of KD is to transfer the entire knowledge set of a teacher model to a student model, the standard KD may not represent the best use of the model¡¯s capacity when a user wishes to classify only a small subset of classes. Also, it is necessary to possess the original teacher model dataset for KD, but for various practical reasons, such as privacy issues, the entire dataset may not be available. Thus, this paper proposes conditional knowledge distillation (CKD), which only distills specialized knowledge corresponding to a given subset of classes, as well as data-free CKD (DF-CKD), which does not require the original data. As a major extension, we devise Joint-CKD, which jointly performs DF-CKD and CKD with only a small additional dataset collected by a client. Our experimental results show that the CKD and DF-CKD methods are superior to standard KD, and also confirm that joint use of CKD and DF-CKD is effective at further improving the overall accuracy of a specialized model.
Å°¿öµå(Keyword) Áö½Ä Áõ·ù   ¸ðµ¨ ¾ÐÃà   ¸ðµ¨ Àü¹®È­   µ¥ÀÌÅÍ ÇÁ¸® ±â°èÇнÀ   knowledge distillation   model compression   model specialization   data-free machine learning  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå